import math
import numpy as np
from PIL import Image
from PIL import ImageFilter
There is one main package that every Python user should be aware of when doing image preprocessing, that is Python Image Library or PIL which goes by the name Pillow. Of course, there are packages like OpenCV but that is really meant for doing computer vision tasks once you have the images ready.
im = Image.open('ball.jpeg')
display(im)

Sometimes, we only require to do basic manipulations to our image. The most common of these is cropping. For example, we might want to try cropping the ball.
# box = (left, upper, right, lower)
box = (700, 100, 1100, 500)
ball = im.crop(box)
display(ball)
Once we have our cropped image, we sometimes want to apply transformations to it to generate an augmented data set of images for your training set. This is easily accomplished with PIL.
def scale_and_rotate_image(im, sx, sy, deg_ccw):
im_orig = im
im = Image.new('RGBA', im_orig.size, (255, 255, 255, 255))
im.paste(im_orig)
w, h = im.size
angle = math.radians(-deg_ccw)
cos_theta = math.cos(angle)
sin_theta = math.sin(angle)
scaled_w, scaled_h = w * sx, h * sy
new_w = int(math.ceil(math.fabs(cos_theta * scaled_w) + math.fabs(sin_theta * scaled_h)))
new_h = int(math.ceil(math.fabs(sin_theta * scaled_w) + math.fabs(cos_theta * scaled_h)))
cx = w / 2.
cy = h / 2.
tx = new_w / 2.
ty = new_h / 2.
a = cos_theta / sx
b = sin_theta / sx
c = cx - tx * a - ty * b
d = -sin_theta / sy
e = cos_theta / sy
f = cy - tx * d - ty * e
return im.transform(
(new_w, new_h),
Image.AFFINE,
(a, b, c, d, e, f),
resample=Image.BILINEAR
)
rotated_ball = scale_and_rotate_image(ball, 1.0, 2.0, 90)
display(rotated_ball)
Computers don't "see" images the same way we do. As we are taught in school, the primary colors are red, yellow, and blue and these colors can make all other colors. We have chosen to represent colors to computers in a similar way. We often represent every color as a combination of 3 or 4 channels. The most common is RGBA:
Can we use this to automate the search of the ball?
r, g, b = im.split()
display(b)
def find_largest_patch(im, patch_w, patch_h):
w, h = im.size
arr = np.zeros((w, h))
thresh = 200
for i in range(w):
for j in range(h):
if im.getpixel((i, j)) > thresh:
arr[i, j] = 1
# brute force very slow!!!
largest = 0
indices = (0, 0)
for i in range(w-patch_w):
for j in range(h-patch_h):
if arr[i:i+patch_w, j:j+patch_h].sum() > largest:
largest = arr[i:i+patch_w, j:j+patch_h].sum()
indices = (i, j)
return indices[0], indices[1], patch_w, patch_h
i, j, w, h = find_largest_patch(b, 200, 200)
box = (i, j, i+w, j+h)
ball = im.crop(box)
display(ball)
AS mentioned, RGB is the most common channel and is conducive to finding very bright colors that contain a lot of one of the primary colors (like an orange ball has a lot of red). But what if we want to isolate an something that has a fairly "mixed" color such as the barn in this example.
im = Image.open('grand_tetons.jpg')
display(im)

r, g, b = im.split()
display(g)
As I mentioned earlier, computers don't "see" color and so they are not restricted to thinking of the primary colors. They can change their "base" colors to something else. For example, in printing, they use CMYK
We can use these different systems to help our algorithms detect features in our images.
im = im.convert('CMYK')
c, m, y, k = im.split()
display(c)
In this case, CMYK doesn't help use very much, but we can do some fun stuff like make the magenta channel more intense and see the effect. We can use the .point() method which is similar to Pandas' apply() function.
m = m.point(lambda i: i * 1.2)
new_im = Image.merge('CMYK', (c,m,y,k)).convert('RGB')
display(new_im)
Of course, there is the grayscale mode for images which is 'L' but the last one I want to talk about is HSV
im = Image.open('grand_tetons.jpg')
im = im.convert('HSV')
h, s, v = im.split()
display(v)
v = v.point(lambda i: i * 1.2)
new_im = Image.merge('HSV', (h,s,v)).convert('RGB')
display(new_im)
im = Image.open('grand_tetons.jpg')
out = im.filter(ImageFilter.FIND_EDGES)
display(out)
im = Image.open('ball.jpeg')
display(im)
out = im.filter(ImageFilter.EDGE_ENHANCE)
display(out)
We can even apply our own filters if we know how to build them. This is done using kernels.
im = Image.open('grand_tetons.jpg')
km = (
-2, -1, 0,
-1, 1, 1,
0, 1, 2
)
# edge kernel
edge_km = (
-1, -1, -1,
-1, 8, -1,
-1, -1, -1
)
scale = 1 if abs(sum(km)) == 0 else abs(sum(km))
k = ImageFilter.Kernel(
size=(3, 3),
kernel=km,
scale=scale,
offset=0
)
out = im.filter(k)
display(out)
The last filter I want to talk about is the mode filter. It simply grabs the model value pixel in every NxN box in the image. This can compress images and remove noise.
im = Image.open('ball.jpeg')
display(im)
out = im.filter(ImageFilter.ModeFilter(5))
display(out)
The last thing I want to mention is that a lot of image preprocessing is done for the purpose of data augmentation. That is, we only have 100 images and we want to give our training algorithm many more images so that the model is better trained. We can rotate, crop, and stretch images to generate a lot of images. But like I keep mentioning, computers don't see color. To them, a white pixel (0,0,0) and a black pixel (255,255,255) is just as different as a blue (0,0,255) and slightly blue (0,0,125) pixel. Thus, sometimes we want to augment our images with random noise so that the algorithm can learn that minor variations in the pixels don't matter. To do this, we often apply noise to the image.
im = Image.open('ball.jpeg')
display(im)
out = im.point(lambda i: i + np.random.normal(0,1))
display(out)
These images look identical to us but are very "far" apart to the computer.